Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: row ID generation for MVCC bitmap filter #6458

Merged
merged 66 commits into from
Jan 31, 2023

Conversation

JinheLin
Copy link
Contributor

@JinheLin JinheLin commented Dec 8, 2022

What problem does this PR solve?

Issue Number: ref #6296

Problem Summary:

  • In order to build a MVCC bitmap filter, we need to scan three columns of handle, version and tag.
  • Iterating through the input stream of Segment::getInputStream can obtain an ordered data stream.
  • However, the current input stream does not contain the position of the record on the array, so a "virtual column" needs to be added to indicate the record position. This "virtual column" is called SegmentRowIdCol.
  • In order to generate the SegmentRowIdCol, the read interfaces of Delta and Stable need to return the 'row position' of returned data.

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Dec 8, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Lloyd-Pottiger
  • flowbehappy

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 8, 2022
@JinheLin JinheLin mentioned this pull request Dec 8, 2022
4 tasks
@JinheLin
Copy link
Contributor Author

JinheLin commented Dec 8, 2022

/run-all-tests

Comment on lines 160 to 172
auto [read_offset, read_rows] = column_file_reader->readRows(output_columns, rows_start_in_file, rows_in_file_limit, range);
actual_read += read_rows;
if (row_ids != nullptr)
{
auto rows_before_cur_file = file_index == 0 ? 0 : column_file_rows_end[file_index - 1];
auto start_row_id = read_offset + rows_before_cur_file;
for (size_t i = 0; i < read_rows; ++i)
{
row_ids->push_back(start_row_id + i);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually, when MVCC, the remaining rows are more than the filter-out rows, maybe we can record the filter-out row_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading here also need to go to the upper layer for calculation, record the filter-out row_id does not meet the current logic.

@JinheLin JinheLin force-pushed the bitmap-filter-seg-row-id branch from 09c8536 to d9f64e7 Compare December 16, 2022 05:40
@JinheLin
Copy link
Contributor Author

/run-all-tests

@Lloyd-Pottiger
Copy link
Contributor

/build

@Lloyd-Pottiger
Copy link
Contributor

/run-integration-test

@JinheLin JinheLin force-pushed the bitmap-filter-seg-row-id branch from 10bf844 to 7b5f213 Compare December 19, 2022 11:55
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is mixed with bitmap logic now.

auto & segment_snap = bitmap_filter->snapshot();
auto enable_handle_clean_read = !hasColumn(columns_to_read, EXTRA_HANDLE_COLUMN_ID);
constexpr auto is_fast_scan = true;
auto enable_del_clean_read = !hasColumn(columns_to_read, TAG_COLUMN_ID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enable_del_clean_read is always false. TiDB Planner is not aware of the tag column, so columns_to_read never contain tag column.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 28, 2022
@JinheLin JinheLin force-pushed the bitmap-filter-seg-row-id branch from 3f0e3a5 to 98f728f Compare December 30, 2022 09:26
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 31, 2022
@JinheLin
Copy link
Contributor Author

JinheLin commented Jan 4, 2023

/run-all-tests

@JinheLin
Copy link
Contributor Author

JinheLin commented Jan 4, 2023

/run-unit-tests

@JaySon-Huang JaySon-Huang force-pushed the bitmap-filter-seg-row-id branch from 03ceb6b to f8a3bc3 Compare January 9, 2023 08:36
@sre-bot
Copy link
Collaborator

sre-bot commented Jan 9, 2023

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 10, 2023
@JinheLin JinheLin force-pushed the bitmap-filter-seg-row-id branch from bb20cd3 to 220b51b Compare January 11, 2023 07:23
@JinheLin
Copy link
Contributor Author

/run-all-tests

@JinheLin
Copy link
Contributor Author

/run-all-tests

@JinheLin
Copy link
Contributor Author

/run-integration-test

@flowbehappy flowbehappy self-requested a review January 30, 2023 03:59
Comment on lines +2429 to +2439
BlockInputStreamPtr stream = segment_snap->stable->getInputStream(dm_context,
columns_to_read,
read_ranges,
filter,
max_version,
expected_block_size,
/*enable_handle_clean_read*/ false,
/*is_fast_scan*/ false,
/*enable_del_clean_read*/ false,
/*read_packs*/ some_packs,
/*need_row_id*/ true);
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth to do this now. We do not need to createSegmentRowIdCol in ConcatSkippableBlockInputStream. When the parent of ConcatSkippableBlockInputStream is DeltaMergeBlockInputStream, there is no need to createSegmentRowIdCol since DeltaMergeBlockInputStream will do that. And the parent of ConcatSkippableBlockInputStream is DMRowKeyFilterBlockInputStream, we can call read(FilterPtr, true) to return filter, and we can use filter to generate bitmap.

Copy link
Contributor

@flowbehappy flowbehappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jan 30, 2023
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The others LGTM. We can add some comments about previous discussion. Leave it to be optimized.

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jan 31, 2023
@JinheLin
Copy link
Contributor Author

The others LGTM. We can add some comments about previous discussion. Leave it to be optimized.

Can you write an issue about it? @Lloyd-Pottiger

@JinheLin
Copy link
Contributor Author

/run-all-tests

@Lloyd-Pottiger
Copy link
Contributor

Lloyd-Pottiger commented Jan 31, 2023

The others LGTM. We can add some comments about previous discussion. Leave it to be optimized.

Can you write an issue about it? @Lloyd-Pottiger

ok, I will open an issue after this pr merged.

@JinheLin
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

@JinheLin: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 097d928

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jan 31, 2023
@ti-chi-bot ti-chi-bot merged commit 4a118fc into pingcap:master Jan 31, 2023
ywqzzy pushed a commit to ywqzzy/tiflash_1 that referenced this pull request Feb 13, 2023
@JinheLin JinheLin deleted the bitmap-filter-seg-row-id branch October 26, 2023 09:05
@JaySon-Huang JaySon-Huang mentioned this pull request Jun 4, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants